# 37 Diagonalization: Part 2 - Characterizations of diagonalization, eigenbasis, eigenspaces, algebraic and geometric multiplicities.

Date modified: Tue 2024-05-07 . 09:29 PM
Date created: Mon 2024-05-06 . 02:47 PM
# 37 Diagonalization: Part 2 - Characterizations of diagonalization, eigenbasis, eigenspaces, algebraic and geometric multiplicities. We are discussing diagonalizability, and in this set of notes we like to develop characterizations of when is a square matrix diagonalizable. ## By definition of similarity and the factorization $PDP^{-1}$. Firstly, by the definition, we say > **Definition.** > A square matrix $A$ is diagonalizable if $A$ is similar to some diagonal matrix $D$, where $A\stackrel{\text{sim}}\sim D$ diagonal. To rephrase this using the definition of similarity, this means > **Factorization characterization.** > A square matrix $A$ is diagonalizable if and only if there exists an invertible matrix $P$ and a diagonal matrix $D$ such that we can _factorize_ $A = P D P^{-1}$. Of course, the challenge is producing such invertible $P$ and diagonal $D$, which we will tackle in a bit. But first, why is a diagonalizable matrix useful? Suppose you want to compute a large power of a matrix, $A^{1000}$, how might we calculate this? Here is a _naive but smart way_: One could approach this logarithmically, by first computing $A^{2}$, then its square $A^{4}$, then the square of that $A^{8}$, etc. This allows to compute powers of $A$ that are powers of $2$ easily, namely $A^{(2^{k})}$. And since $1000 = 512 + 256 + 128 + 64 + 32 +8$ (binary expansion), we simply have $A^{1000} = A^{512} A^{256} A^{128} A^{64} A^{32} A^{8}$. This greatly reduce the number of multiplications we need to perform. However, if our matrix $A$ is diagonalizable, then something nice happens: We can factorize $A = PDP^{-1}$, then $$ A^{1000} = (PDP^{-1})^{1000} = (PDP^{-1})(PDP^{-1})\cdots (PDP^{-1})= PD^{1000}P^{-1}, $$because matrix multiplication is _associative_. Effectively, we trade the problem of computing $A^{1000}$ to $D^{1000}$. And since $D$ is diagonal, its power is simple to find: > If $D$ is a diagonal matrix with diagonal entries given by $D=\text{diag}(\lambda_{1},\lambda_{2},\ldots,\lambda_{n})$, that is, $$ D = \begin{pmatrix}\lambda_{1}\\&\lambda_{2}\\&&\ddots\\&&&\lambda_{n}\end{pmatrix} $$ then its $k$-th power is just $D^{k}=\text{diag}(\lambda_{1}^{k},\lambda_{2}^{k},\ldots,\lambda_{n}^{k})$, or $$ D^{k} = \begin{pmatrix}\lambda_{1}^{k}\\&\lambda_{2}^{k}\\&&\ddots\\&&&\lambda_{n}^{k}\end{pmatrix} $$ raising the diagonal entries by $k$. **CAUTION.** You cannot simply raise the entries of a matrix $A$ by $k$ to compute $A^{k}$! This is a diagonal matrix special! Bottom-line: If a square matrix $A$ is diagonalizable, its power $A^{k}$ is easy to compute. **Remark.** Some verbiage, sometimes we say "to _diagonalize_ a matrix" to mean to produce the factorization $PDP^{-1}$ for the matrix, where $P$ invertible and $D$ diagonal. We sometimes say the factorization $PDP^{-1}$ as the _diagonalization_ of $A$. ## Eigenbasis characterization. Let us analyze carefully what it means for an $n\times n$ square matrix to be diagonalizable column by column. If $A = PDP^{-1}$ for some invertible $P$ and diagonal $D$, let's say $$ P=\begin{pmatrix}| & | & & | \\ \vec v_{1} & \vec v_{2} & \cdots & \vec v_{n} \\ | & | & & |\end{pmatrix}, D = \begin{pmatrix}\lambda_{1} \\ & \lambda_{2} \\ & & \ddots \\ & & & \lambda_{n}\end{pmatrix} $$ then working out $AP=PD$ gives $$ \begin{pmatrix}| & | & & | \\ A\vec v_{1} & A\vec v_{2} & \cdots & A\vec v_{n} \\ | & | & & |\end{pmatrix} = \begin{pmatrix}| & | & & | \\ \lambda_{1}\vec v_{1} & \lambda_{2}\vec v_{2} & \cdots & \lambda_{n}\vec v_{n} \\ | & | & & |\end{pmatrix} $$as we saw before. Since $P$ is invertible, its columns $\vec v_{1},\vec v_{2},\ldots,\vec v_{n}$ forms a _basis_. And since each $\vec v_{i}$ satisfies $A\vec v_{i} = \lambda_{i} \vec v_{i}$, we see that each of these basis vectors $\vec v_{i}$ is further an _eigenvector_ of $A$. This set of vectors $\beta = \{\vec v_{1},\vec v_{2},\ldots,\vec v_{n}\}$ is called an **eigenbasis** to the matrix $A$, it is a basis set whose basis vectors are all eigenvectors of $A$. So, this gives another characterization of diagonalizability, > **Eigenbasis characterization.** > Let $A$ be an $n\times n$ matrix over the scalars $\mathbb F$. Then $A$ is diagonalizable if and only if there exists an eigenbasis $\beta = \{\vec v_{1},\ldots,\vec v_{n}\}$ to $A$, where the $n$ vectors $\vec v_{1},\ldots,\vec v_{n}$ forms a basis to $\mathbb{F}^{n}$ and each $\vec v_{i}$ is an eigenvector of $A$. > Further, if $\beta =\{\vec v_{1},\ldots,\vec v_{n}\}$ is an eigenbasis to $A$, then $A$ is diagonalized as $PDP^{-1}$ where $$ P=\begin{pmatrix}| & | & & | \\ \vec v_{1} & \vec v_{2} & \cdots & \vec v_{n} \\ | & | & & |\end{pmatrix}, D = \begin{pmatrix}\lambda_{1} \\ & \lambda_{2} \\ & & \ddots \\ & & & \lambda_{n}\end{pmatrix} $$with each $\lambda_{i}$ the corresponding eigenvalue that $A\vec v_{i} = \lambda_{i} \vec v_{i}$. **Example.** Let us consider the $2\times 2$ matrix $A$ where $$ A = \begin{pmatrix}2 & 1 \\ 0 &1\end{pmatrix}. $$Consider the basis set $\beta = \{\vec v_{1},\vec v_{2}\}$ for $\mathbb R^{2}$ where $$ \vec v_{1} = \begin{pmatrix}1\\0\end{pmatrix} \quad\text{and}\quad \vec v_{2}=\begin{pmatrix}1\\-1\end{pmatrix} $$ Note that the basis vectors are in fact eigenvectors of $A$: $$ \begin{align*} A\vec v_{1} = \begin{pmatrix}2 & 1\\0 & 1\end{pmatrix}\begin{pmatrix}1\\0\end{pmatrix} = \begin{pmatrix}2\\0\end{pmatrix} = 2\begin{pmatrix}1\\0\end{pmatrix}=2\vec v_{1} \\ A\vec v_{2} = \begin{pmatrix}2 & 1\\0 & 1\end{pmatrix}\begin{pmatrix}1\\-1\end{pmatrix} = \begin{pmatrix}1\\-1\end{pmatrix} = 1\begin{pmatrix}1\\-1\end{pmatrix}=1\vec v_{2} \end{align*} $$Hence $\beta$ is an eigenbasis to $A$, and $A$ is therefore diagonalizable! The _diagonalization_ of $A$ is given by $A = PDP^{-1}$ where $$ P = \begin{pmatrix}1 & 1 \\ 0 & -1\end{pmatrix} \quad\text{and}\quad D = \begin{pmatrix}2 & 0 \\0 & 1\end{pmatrix}. \blacklozenge $$Great, if we can find an eigenbasis to a square matrix $A$, then we have shown $A$ is diagonalizable, and can even write down its diagonalization $PDP^{-1}$. But how do find this eigenbasis, if exists at all? Also curiously, the matrix $A = \begin{pmatrix}2&1\\0&1\end{pmatrix}$ is diagonalizable, but $B = \begin{pmatrix}2&1\\0&2\end{pmatrix}$ turns out to be not diagonalizable! How do we know? For this we need to develop how to find eigenvectors -- they live in eigenspaces. ## Interlude -- brief words about polynomials. Let $p(t) =a_{n}t^{n}+a_{n-1}t^{n-1}+\cdots+a_{0}$ be a polynomial of degree $n$. Suppose $\lambda$ is a root of $p$ that repeated for a total of $m$ times, where $p(t)=q(t)(t-\lambda)^{m}$ for some $q$ such that $q(\lambda)\neq 0$. Then we say this $m$ is the **algebraic multiplicity of $\lambda$**, and denote it as $\text{am}(\lambda)$. For example, the polynomial $p(t)=4(2-t)^{5}(8+t)^{3}t^{7}$ has roots $2,-8,0$ where their algebraic multiplicities are $$ \begin{array}{} \text{root } \lambda & 2 & -8 & 0 \\ \text{am}(\lambda) & 5 & 3 & 7 \end{array} $$ If a polynomial $p(t)$ can be factored completely into product of linear terms, like $$ p(t)=c (\lambda_{1}-t)^{m_{1}}(\lambda_{2}-t)^{m_{2}}\cdots(\lambda_{k}-t)^{m_{k}} $$then we say the polynomial $p$ **splits** over $\mathbb F$ , where $c,\lambda_{i}$ are all scalars in $\mathbb F$. It is possible for polynomials over the reals that do not split over the reals, for example $p(t)=t^{2}+1$ cannot be factored over the reals into product of linear terms. However, over the complex numbers this is a _non-issue_, that _every complex polynomial splits over $\mathbb C$_. This is because of: > **Fundamental theorem of algebra.** > A nonconstant polynomial $p(t)$ over $\mathbb C$ always have a root in $\mathbb C$. In particular, a degree $n$ polynomial $p(t)$ over $\mathbb C$ will split over $\mathbb C$, as a product of $n$ linear terms, some term may repeat. **Example.** The polynomial $p(t)=t^{2}+1$ does not split over $\mathbb R$, but splits over $\mathbb C$. Over the complex numbers we have $p(t) = (t-i)(t+i)$. One thing to note is that if a degree $n$ polynomial $p$ splits, then summing over all the algebraic multiplicities across the distinct roots of $p$ gives $$ \sum_{\lambda} \text{am}(\lambda) =n=\deg(p), $$ and in general $\sum_{\lambda}\text{am}(\lambda) \le n$. ## Eigenspaces. Let us revisit how we find eigenvalues again, and why they come from roots of the characteristic polynomial. Suppose $\lambda$ is an eigenvalue to some square matrix $A$, then there exists some nonzero vector $\vec v\neq 0$ such that $$ \begin{array}{} & A\vec v = \lambda \vec v, & \vec v\neq 0 \\ \iff & (A -\lambda I)\vec v = \vec 0, & \vec v \neq 0 \\ \iff & \vec v\in \ker (A-\lambda I), & \vec v\neq 0 \\ \iff & A - \lambda I \text{ is not invertible} \\ \iff & \det(A - \lambda I) = 0 \end{array} $$which we see that $\lambda$ is a root to the characteristic polynomial $p_{A}(t) = \det(A-tI)$. But this deduction reveals even more, it shows that the eigenvector $\vec v$ associated to the eigenvalue $\lambda$ lives in $\ker(A-\lambda I)$. So we denote $$ E_{\lambda} = \ker (A-\lambda I) $$as the **eigenspace** of the eigenvalue $\lambda$, where every eigenvector of eigenvalue $\lambda$ are found in $E_{\lambda}$. In fact, every vector in $E_{\lambda}$ except the zero vector is an eigenvector of $A$ with eigenvalue $\lambda$, and that is _all_ of them of with eigenvalue $\lambda$. Also note that as $E_{\lambda}$ is the kernel of a matrix, it is a _linear space_, hence the terminology eigen_space_. And since it is a linear space, we can find a basis to $E_{\lambda}$ as well as its dimension, and we say the **geometric multiplicity of $\lambda$** to be the dimension of $E_{\lambda}$, with short hand $\text{gm}(\lambda) = \dim(E_{\lambda})$. An important and useful fact, > If $\lambda$ is an eigenvalue of $A$, then we always have $$ 1 \le \text{gm}(\lambda) \le \text{am}(\lambda). $$ (I will prove this separately in another note.) Now, since to diagonalize a square matrix $A$ is to produce an eigenbasis to $A$, what we should do then is the following: - First find all the eigenvalues $\lambda$ of $A$ by solving the roots of the characteristic polynomial $p_{A}(t)=\det(A-t I)$ - For each eigenvalue $\lambda$, compute the eigenspaces $E_{\lambda}$ (that's where we can find the eigenvectors), and write down a _basis_ to each $E_{\lambda}$ - See if we can put together an eigenbasis to $A$ by using the bases to each $E_{\lambda}$. Let us see this in action. **Example.** Consider the matrices $$ A = \begin{pmatrix}2 & 1\\0 & 1\end{pmatrix} \quad\text{and}\quad B=\begin{pmatrix}2 & 1\\0 & 2\end{pmatrix} $$Let us analyze the diagonalizability of the matrix $A$. First its characteristic polynomial is $$ p_{A}(t) = \det(A-tI)=\det\begin{pmatrix}2-t & 1 \\ 0 & 1-t\end{pmatrix} = (2-t)(1-t) $$so we see the roots of $p_{A}$ are $2,1$, which are the eigenvalues of $A$. Their algebraic multiplicities are $1$ each. Now, for each eigenvalue $\lambda$, we will compute their eigenspace $E_{\lambda}$ and write down a basis for $E_{\lambda}$. Note $$ E_{2} = \ker (A-2I)=\ker \begin{pmatrix}0 & 1\\0 & -1\end{pmatrix}= \operatorname{span} \{\begin{pmatrix}1\\0\end{pmatrix}\} $$and $$ E_{1} = \ker (A-I) = \ker \begin{pmatrix}1 & 1\\0 & 0\end{pmatrix} = \operatorname{span} \{\begin{pmatrix}-1\\1\end{pmatrix}\}. $$So we can organize our information into this _eigen-table_ (this is a made up term): $$ \begin{array}{c|c|c} \lambda & 2 & 1 \\\hline \text{am}(\lambda) & 1 & 1 \\ \text{basis for } E_{\lambda} & \begin{pmatrix}1\\0\end{pmatrix} & \begin{pmatrix}-1 \\ 1\end{pmatrix} \\ \text{gm}(\lambda) & 1 & 1 \end{array} $$ Now let's see, do we have enough linearly independent eigenvectors that forms a basis for $\mathbb R^{2}$? Yes, we have two of them, so $$ \begin{pmatrix}1\\0\end{pmatrix},\begin{pmatrix}-1\\1\end{pmatrix} $$forms an eigenbasis to $A$. So $A$ is diagonalizable, and we can diagonalize $A$ as $PDP^{-1}$ where $$ P = \begin{pmatrix}1 & -1 \\0 & 1\end{pmatrix} \text{ and }D=\begin{pmatrix}2\\ & 1\end{pmatrix}. $$ As for matrix $B$, note its characteristic polynomial is $$ p_{B}(t) = \det(B-tI)= \det \begin{pmatrix}2-t & 1\\0 & 2-t\end{pmatrix} = (2-t)^{2}. $$So there is only eigenvalue $2$ with algebraic multiplicity of $2$. The eigenspace $E_{2}$ is $$ E_{2} = \ker \begin{pmatrix}0 & 1\\0 & 0\end{pmatrix} = \operatorname{span} \{\begin{pmatrix}1 \\ 0\end{pmatrix}\} $$So we have _eigen-table_ for $B$ as follows $$ \begin{array}{c|c} \lambda & 2\\ \hline \text{am}(\lambda) & 2 \\ \text{basis for }E_{\lambda} & \begin{pmatrix}1\\0\end{pmatrix} \\ \text{gm}(\lambda) & 1 \end{array} $$ This is an issue, we don't have enough linearly independent eigenvectors to form a basis of $\mathbb R^{2}$! There are no more eigenspaces to use and the available eigenspace $E_{2}$ is not big enough. Therefore there is no eigenbasis possible to $B$, and $B$ is not diagonalizable! By the way, in this case where the geometric multiplicity is not big enough, that $\text{gm}(2) < \text{am}(2)$, we say the eigenvalue $2$ is **defective** or **deficient**. $\blacklozenge$ This shows that to be diagonalizable, the geometric multiplicities cannot be deficient and must all be big enough that they add up to $n$, where $A$ is $n\times n$, so we can get a basis. Hence we have > **Eigenspace characterization.** > An $n\times n$ matrix $A$ is diagonalizable if and only if the sum of the geometric multiplicities across different eigenvalues sum to $n$, that $$ \sum_{\lambda} \text{gm}(\lambda) = n. $$ Note in general for any square matrix, $\sum_{\lambda}\text{gm}(\lambda) \le n$. There is one more observation we can make. If the characteristic polynomial does not split, then the sum of the algebraic multiplicities will be strictly less than $n$. And since each geometric multiplicity is no greater than the algebraic multiplicities, we see that a non-splitting characteristic polynomial would lead to non-diagaonlizability. So, we can rephrase this characterization as > **Multiplicities characterization.** > An $n\times n$ matrix $A$ is diagonalizable over $\mathbb F$ if and only if both (i) the characteristic polynomial $p_{A}$ splits over $\mathbb F$, and (ii) for each eigenvalue $\lambda$, we have $\text{am}(\lambda) = \text{gm}(\lambda)$. Again, if we are working over the complex numbers, splitting polynomials is not a problem as they will always split over $\mathbb C$. Let us look at some examples. **Example.** Consider the rotation by $90^{\circ}$ CCW matrix $R$, where $$ R = \begin{pmatrix}0 & -1\\1 & 0\end{pmatrix}. $$ Its characteristic polynomial is $$ p_{R}(t) = \det \begin{pmatrix}-t & -1\\1 & -t\end{pmatrix} = t^{2} +1. $$ Note, $p_{R}$ does not split over the reals. So, $R$ is **not** diagonalizable over $\mathbb R$. However, of the complex numbers, we can factor $$ p_{R}(t) = (t-i)(t+i) $$so the eigenvalues of $R$ over $\mathbb C$ is $i$ and $-i$, with algebraic multiplicities 1 apart. Since we always have $1 \le\text{gm}(\lambda)\le\text{am}(\lambda)$, we can fill in the _eigen-table_ at least partially $$ \begin{array}{} \lambda & i & -i \\ \text{am}(\lambda) & 1 & 1 \\ \text{basis for }E_{\lambda} & ? & ? \\ \text{gm}(\lambda) & 1 & 1 \end{array} $$as the geometric multiplicities are forced, when $1 \le\text{gm}(\pm i)\le\text{am}(\pm i)=1$, we must have $\text{gm}(\pm i)=1$. So, $p_{R}$ splits over $\mathbb C$ and $\text{am}(\lambda) =\text{gm}(\lambda)$ for each eigenvalue (no defective eigenvalues), so $R$ is diagonalizable over $\mathbb C$! Alternatively, we also have $\sum\text{gm}(\lambda) =2$, which leads to the same conclusion. If we further want the diagonalization of $R$ over $\mathbb C$, then we need to compute basis sets to the eigenspaces. Note $$ E_{i} = \ker \begin{pmatrix}-i & -1\\1 & -i\end{pmatrix} = \ker \begin{pmatrix}1 & -i\\0 & 0\end{pmatrix} = \operatorname{span} \{\begin{pmatrix}i\\1\end{pmatrix}\} $$and $$ E_{-i} = \ker \begin{pmatrix}i & -1\\1 & i\end{pmatrix} = \ker \begin{pmatrix}1 & i\\0 & 0\end{pmatrix} = \operatorname{span} \{\begin{pmatrix}-i \\1\end{pmatrix}\} $$So we have diagonalization $R=PDP^{-1}$ with $$ P = \begin{pmatrix}i & -i \\ 1 & 1\end{pmatrix} \text{ and } D=\begin{pmatrix}i\\ & -i\end{pmatrix}. \quad\blacklozenge $$ **Example.** Consider the matrix $A$ where $$ A = \begin{pmatrix}0 & 1 & 1\\2 & 1 & 2\\3 & 3 & 2\end{pmatrix}. $$You are given that $A$ has eigenvalues $-1,5$ . Determine if $A$ is diagonalizable. And if so, diagonalize it. $\blacktriangleright$ Let us try to fill in this _eigen-table_ $$ \begin{array}{} \lambda & -1 & 5 \\ \text{am}(\lambda) & ? & ? \\ \text{basis for } E_{\lambda} & ? & ? \\ \text{gm}(\lambda) & ? & ? \end{array} $$ We could use long division to figure what are all the algebraic multiplicities of the eigenvalues, and see if there is any missing eigenvalues. But since we have to compute $E_{\lambda}$ anyway, we start with that. Note $$ E_{-1} = \ker \begin{pmatrix}1 & 1 & 1\\2 & 2 & 2\\3 & 3 & 3\end{pmatrix} = \ker \begin{pmatrix}1 & 1 & 1\\0 & 0 & 0\\0 & 0 & 0\end{pmatrix} = \operatorname{span} \{\begin{pmatrix}-1\\1\\0\end{pmatrix},\begin{pmatrix}-1\\0\\1\end{pmatrix}\}. $$This shows $\text{gm}({-1})=2$. But what about $\text{gm}(5)$ ? We know $\text{gm}(5) \ge 1$ and that $\text{gm}(-1) +\text{gm}(5) \le 3$, as $A$ is $3\times 3$ matrix. So we are forced to have $\text{gm}(5)=1$, and that the sum $$ \text{gm}(-1) + \text{gm}(5) = 3. $$Hence we conclude that $A$ is diagonalizable! Since we also want the diagonalization, we continue to to find $E_{5}$. We have $$ E_{5} = \ker \begin{pmatrix}-5 & 1 & 1 \\ 2 & -4 & 2\\3 & 3 & -3\end{pmatrix} = \ker \begin{pmatrix}1 & 0 & -\frac{1}{3} \\ 0 & 1 & -\frac{2}{3} \\ 0 & 0 & 0\end{pmatrix} = \operatorname{span}\left\{ \begin{pmatrix} \frac{1}{3} \\ \frac{2}{3}\\1\end{pmatrix} \right\}. $$So we can fill in our _eigen-table_ $$ \begin{array}{} \lambda & -1 & 5 \\ \text{am}(\lambda) & (2) & (1) \\ \text{basis for } E_{\lambda} & \begin{pmatrix}-1\\1\\0\end{pmatrix},\begin{pmatrix}-1\\0\\1\end{pmatrix} & \begin{pmatrix} \frac{1}{3} \\ \frac{2}{3}\\1\end{pmatrix} \\ \text{gm}(\lambda) & 2 & 1 \end{array} $$which gives the diagonalization $A = PDP^{-1}$ with $$ P = \begin{pmatrix}-1 & -1 & \frac{1}{3} \\ 1 & 0 & \frac{2}{3} \\ 0 & 1 & 1\end{pmatrix} \text{ and } D = \begin{pmatrix}-1 \\ & -1 \\ & & 5\end{pmatrix}. \quad\blacklozenge $$In the next set of notes, we will see some applications and miscellanea useful facts about diagonalization, eigenvalues, and characteristic polynomial.